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Abstract 

We  present  a  way  of  defining  the  subtype  relation  that  ensures  that  subtype  objects  preserve  behavioral 
properties  of  their  supertypes.  The  subtype  relation  is  based  on  the  specifications  of  the  sub-  and  supertypes. 
Our  approach  handles  mutable  types  and  allows  subtypes  to  have  more  methods  than  their  supertypes. 
Dealing  with  mutable  types  and  subtypes  that  extend  their  supertypes  has  surprising  consequences  on  how 
to  specify  and  reason  about  objects.  In  our  approach,  we  discard  the  standard  data  type  induction  rule, 
we  prohibit  the  use  of  an  analogous  “history”  rule,  and  we  make  up  for  both  losses  by  adding  explicit 
predicates — invariants  and  constraints-to  our  type  specifications.  We  also  discuss  the  ramifications  of  our 
approach  of  subtyping  the  design  of  type  families. 
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1  Introduction 


What  does  it  mean  for  one  type  to  be  a  subtype  of  another?  We  argue  that  this  is  a  semantic  question 
having  to  do  with  the  behavior  of  the  objects  of  the  two  types:  the  objects  of  the  subtype  ought  to  behave 
the  same  as  those  of  the  supertype  as  far  as  anyone  or  any  program  using  supertype  objects  can  tell. 

For  example,  in  strongly  typed  object-oriented  languages  such  as  Simula  67[DMN70],  C+-f[Str86], 
Modula-3[Nel91],  and  Trellis/Owl[SCB+86],  subtypes  are  used  to  broaden  the  assignment  statement.  An 
assignment 

x:  T  :=  E 

is  legal  provided  the  type  of  expression  E  is  a  subtype  of  the  declared  type  T  of  variable  x.  Once  the 
assignment  has  occurred,  x  will  be  used  according  to  its  “apparent”  type  T,  with  the  expectation  that  if  the 
program  performs  correctly  when  the  actual  type  of  x’s  object  is  T,  it  will  also  work  correctly  if  the  actual 
type  of  the  object  denoted  by  x  is  a  subtype  of  T. 

Clearly  subtypes  must  provide  the  expected  methods  with  compatible  signatures.  This  consideration  has 
led  to  the  formulation  of  the  contra/covariance  rules[BHJ+87,  SCB+86,  Car88],  However,  these  rules  are 
not  strong  enough  to  ensure  that  the  program  containing  the  above  assignment  will  work  correctly  for  any 
subtype  of  T,  since  all  they  do  is  ensure  that  no  type  errors  will  occur.  It  is  well  known  that  type  checking, 
while  very  useful,  captures  only  a  small  part  of  what  it  means  for  a  program  to  be  correct;  the  same  is 
true  for  the  contra/covariance  rules.  For  example,  stacks  and  queues  might  both  have  a  put  method  to  add 
an  element  and  a  get  method  to  remove  one.  According  to  the  contravariance  rule,  either  could  be  a  legal 
subtype  of  the  other.  However,  a  program  written  in  the  expectation  that  x  is  a  stack  is  unlikely  to  work 
correctly  if  x  actually  denotes  a  queue,  and  vice  versa. 

What  is  needed  is  a  stronger  requirement  that  constrains  the  behavior  of  subtypes:  properties  that  can 
be  proved  using  the  specification  of  an  object’s  presumed  type  should  hold  even  though  the  object  is  actually 
a  member  of  a  subtype  of  that  type: 

Subtype  Requirement:  Let  (l){x)  be  a  property  provable  about  objects  x  of  type  T.  Then  (j){y) 
should  be  true  for  objects  y  of  type  S  where  S  is  a  subtype  of  T. 

A  type’s  specification  determines  what  properties  we  can  prove  about  objects. 

We  are  interested  only  in  safety  properties  (“nothing  bad  happens”).  First,  properties  of  an  object’s 
behavior  in  a  particular  program  must  be  preserved:  to  ensure  that  a  program  continues  to  work  as  expected, 
calls  of  methods  made  in  the  program  that  assume  the  object  belongs  to  a  supertype  must  have  the  same 
behavior  when  the  object  actually  belongs  to  a  subtype.  In  addition,  however,  properties  independent 
of  particular  programs  must  be  preserved  because  these  are  important  when  independent  programs  share 
objects.  We  focus  on  two  kinds  of  such  properties:  invariants,  which  are  properties  true  of  all  states,  and 
history  properties,  which  are  properties  true  of  all  sequences  of  states.  We  formulate  invariants  as  predicates 
over  single  states  and  history  properties,  over  pairs  of  states.  For  example,  an  invariant  property  of  a  bag  is 
that  its  size  is  always  less  than  its  bound;  a  history  property  is  that  the  bag’s  bound  does  not  change.  We 
do  not  address  other  kinds  of  safety  properties  of  computations,  e,g,,  the  existence  of  an  object  in  a  state, 
the  number  of  objects  in  a  state,  or  the  relationship  between  objects  in  a  state,  since  these  do  not  have 
to  do  with  the  meanings  of  types.  We  also  do  not  address  liveness  properties  (‘^something  good  eventually 
happens”),  e.g.,  the  size  of  a  bag  will  eventually  reach  the  bound. 

This  chapter  provides  a  general,  yet  easy  to  use,  definition  of  the  subtype  relation  that  satisfies  the 
Subtype  Requirement.  Our  approach  handles  mutable  types  and  allows  subtypes  to  have  more  methods 
than  their  supertypes.  Dealing  with  mutable  types  and  subtypes  that  extend  their  supertypes  has  surprising 
consequences  on  how  to  specify  and  reason  about  objects.  In  our  approach,  we  discard  the  standard  data 
type  induction  rule,  we  prohibit  the  use  of  an  analogous  “history”  rule,  and  we  make  up  for  both  losses  by 
adding  explicit  predicates  to  our  type  specifications,  Our  specifications  are  formal,  which  means  that  they 
have  a  precise  mathematical  meaning  that  serves  as  a  firm  foundation  for  reasoning.  Our  specifications  can 
also  be  used  informally  as  described  in  [LG85]. 

Our  definition  applies  in  a  very  general  distributed  environment  in  which  possibly  concurrent  users  share 
mutable  objects.  Our  approach  is  also  constructive:  One  can  prove  whether  a  subtype  relation  holds  by 
proving  a  small  number  of  simple  lemmas  based  on  the  specifications  of  the  two  types. 
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The  chapter  also  explores  the  ramifications  of  the  subtype  relation  and  shows  how  interesting  type  families 
can  be  defined.  For  example,  arrays  are  not  a  subtype  of  sequences  (because  the  user  of  a  sequence  expects 
it  not  to  change  over  time)  and  32-bit  integers  are  not  a  subtype  of  64-bit  integers  (because  a  user  of  64-bit 
integers  would  expect  certain  method  calls  to  succeed  that  will  fail  when  applied  to  32-bit  integers)  However 
type  families  can  be  defined  that  group  such  related  types  together  and  thus  allow  generic  routines  to  be 
written  that  work  for  all  family  members.  Our  approach  makes  it  particularly  easy  to  define  type  families: 
It  emphasizes  the  properties  that  all  family  members  must  preserve,  and  it  does  not  require  the  introduction 
of  unnecessary  methods  (i.e.,  methods  that  the  supertype  would  not  naturally  have). 

The  chapter  is  organized  as  follows.  Section  2  discusses  in  more  detail  what  we  require  of  our  subtype 
relation  and  provides  the  motivation  for  our  approach.  We  describe  our  model  of  computation  in  Section  3 
and  present  our  specification  method  in  Section  4.  We  give  a  formal  definition  of  subtyping  in  Section  5-  we 
discuss  Its  ramifications  on  designing  type  hierarchies  in  Section  6.  We  describe  related  work  in  Section  7 
and  summarize  our  contributions  in  Section  8. 


2  Motivation 

To  motivate  the  basic  idea  behind  our  notion  of  subtyping,  let’s  look  at  an  example.  Consider  a  bounded 
bag  type  that  provides  a  put  method  that  inserts  elements  into  a  bag  and  a  get  method  that  removes  an 
arbitrary  element  from  a  bag.  Put  has  a  pre-condition  that  checks  to  see  that  adding  an  element  will  not 
grow  the  bag  beyond  its  bound;  get  has  a  pre-condition  that  checks  to  see  that  the  bag  is  non-empty. 

Consider  also  a  bounded  stack  type  that  has,  in  addition  to  push  and  pop  methods,  a  swapJop  method 
that  takes  an  integer,  i,  and  modifies  the  stack  by  replacing  its  top  with  i.  Stack’s  push  and  pop  methods 

have  pre-conditions  similar  to  bag’s  put  and  get,  and  swap.top  has  a  pre-condition  requiring  that  the  stack 
IS  non-empty. 

Intuitively,  stack  is  a  subtype  of  bag  because  both  kinds  of  collections  behave  similarly.  The  main 
difference  is  that  the  get  method  for  bags  does  not  specify  precisely  what  element  is  removed;  the  pop 
me  hod  for  stack  is  more  constrained,  but  what  it  does  is  one  of  the  permitted  behaviors  for  bag’s  get 
metnod.  Let  s  ignore  swap. top  for  the  moment. 

Su^ose  we  want  to  show  stack  is  a  subtype  of  bag.  We  need  to  relate  the  values  of  stacks  to  those  of 
bags.  This  can  be  done  by  means  of  an  abstraction  function,  like  that  used  for  proving  the  correctness  of 

implementations  [Hoa72].  A  given  stack  value  maps  to  a  bag  value  where  we  abstract  from  the  insertion 
order  on  the  elements. 

We  also  need  to  relate  stack’s  methods  to  bag’s.  Clearly  there  is  a  correspondence  between  stack’s  push 
method  and  bag  s  put  and  similarly  for  the  pop  and  get  methods  (even  though  the  names  of  the  corresponding 
methods  do  not  match).  The  pre-  and  post-conditions  of  corresponding  methods  will  need  to  relate  in  some 
precise  (to  be  defined)  way.  In  showing  this  relationship  we  need  to  appeal  to  the  abstraction  function  so 
that  we  can  reason  about  stack  values  in  terms  of  their  corresponding  bag  values. 

Finally,  what  about  swap.top?  Most  other  definitions  of  the  subtype  relation  have  ignored  such  “extra” 
methods,  and  it  is  perfectly  adequate  do  so  when  programs  are  considered  in  isolation  and  there  is  no  aliasing, 
n  such  a  constrained  situation,  a  program  that  uses  an  object  that  is  apparently  a  bag  but  is  actually  a  stack 
will  never  call  the  extra  methods,  and  therefore  their  behavior  is  irrelevant.  However,  we  cannot  ignore  extra 
me  hods  in  the  presence  of  aliasing,  and  also  in  a  general  computational  environment  that  allows  sharing 
of  mutable  objects  by  multiple  users  or  processes.  In  particular,  we  need  to  pay  attention  to  extra  mutator 
methods  (like  swap.top)  that  modify  their  object. 

Consider  first  the  case  of  aliasing.  The  problem  here  is  that  within  a  program  an  object  is  accessible  by 
more  than  one  name,  so  that  modifications  using  one  of  the  names  are  visible  when  the  object  is  accessed 
using  the  other  name.  For  example,  suppose  ir  is  a  subtype  of  r  and  that  variables 

x:  r 

y:  ^ 

both  denote  the  same  object  (which  must,  of  course,  belong  to  cr  or  one  of  its  subtypes).  When  the  object 
IS  accessed  through  x,  only  r  methods  can  be  called.  However,  when  it  is  used  through  y,  <r  methods  can  be 
called  and  if  these  methods  are  mutators,  their  effects  will  be  visible  later  when  the  object  is  accessed  via 
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X.  To  reason  about  the  use  of  variable  x  using  the  specification  of  its  type  r,  we  need  to  impose  additional 
constraints  on  the  subtype  relation. 

Now  consider  the  case  of  an  environment  of  shared  mutable  objects,  such  as  is  provided  by  object-oriented 
databases  (e.g.,  Thor  [Lis92]  and  Gemstone  [MS90]).  In  such  systems,  there  is  a  universe  containing  shared, 
mutable  objects  and  a  way  of  naming  those  objects.  In  general,  lifetimes  of  objects  may  be  longer  than 
the  programs  that  create  and  access  them  (i.e.,  objects  might  be  persistent)  and  users  (or  programs)  may 
access  objects  concurrently  and/or  aperiodically  for  varying  lengths  of  time.  Of  course  there  is  a  need  for 
some  form  of  concurrency  control  in  such  an  environment.  We  assume  such  a  mechanism  is  in  place,  and 
consider  a  computation  to  be  made  up  out  of  atomic  units  (i.e.,  transactions)  that  exclude  one  another.  The 
transactions  of  different  computations  can  be  interleaved  and  thus  one  computation  is  able  to  observe  the 
modifications  made  by  another. 

If  there  were  subtyping  in  such  an  environment  the  following  situation  might  occur.  A  user  installs  a 
directory  object  that  maps  string  names  to  bags.  Later,  a  second  user  enters  a  stack  into  the  directory  under 
some  string  name;  such  a  binding  is  analogous  to  assigning  a  subtype  object  to  a  variable  of  the  supertype. 
After  this,  both  users  occasionally  access  the  stack  object.  The  second  user  knows  it  is  a  stack  and  accesses 
it  using  stack  methods,  The  question  is:  What  does  the  first  user  need  to  know  in  order  for  his  or  her 
programs  to  make  sense? 

We  think  it  ought  to  be  sufficient  for  a  user  to  know  only  about  the  “apparent”  type  of  the  object;  the 
subtype  ought  to  preserve  any  properties  that  can  be  proved  about  the  supertype.  In  particular,  the  first 
user  ought  to  be  able  to  reason  about  his  or  her  use  of  the  stack  object  using  invariant  and  history  properties 
of  bag. 

Our  approach  achieves  this  goal  by  adding  information  to  type  specifications.  To  handle  invariants,  we 
add  an  invariant  clause;  to  handle  history  properties,  a  constraint  clause.  Showing  that  <7  is  a  subtype  of  r 
requires  showing  that  (under  the  abstraction  function)  cr’s  invariant  implies  r’s  invariant  and  cr’s  constraint 
implies  r’s  constraint, 

For  example,  for  the  bag  and  stack  example,  the  two  invariants  are  identical:  both  state  that  the  size  of 
the  bag  (stack)  is  less  than  or  equal  to  its  bound.  Similarly,  the  two  constraints  are  identical:  both  state  that 
the  bound  of  the  bag  (or  stack)  does  not  change.  Showing  that  stack’s  invariant  and  constraint  respectively 
imply  bag’s  invariant  and  constraint  is  trivial.  The  extra  method  swapJop  is  permitted  because  even  though 
it  changes  the  stack’s  contents,  it  preserves  stack’s  invariant  and  constraint. 

In  Section  5  we  present  and  discuss  our  subtype  definition.  First,  however,  we  define  our  model  of 
computation,  and  then  discuss  specifications,  since  these  define  the  objects,  values,  and  methods  that  will 
be  related  by  the  subtype  relation. 

3  Model  of  Computation 

We  assume  a  set  of  all  potentially  existing  objects,  Obj^  partitioned  into  disjoint  typed  sets.  Each  object 
has  a  unique  identity.  A  type  defines  a  set  of  values  for  an  object  and  a  set  of  methods  that  provide  the  only 
means  to  manipulate  that  object.  Effectively  Ohj  is  a  set  of  unique  identifiers  for  all  objects  that  can  contain 
values. 

Objects  can  be  created  and  manipulated  in  the  course  of  program  execution.  A  state  defines  a  value  for 
each  existing  object.  It  is  a  pair  of  mappings,  an  environment  and  a  store.  An  environment  maps  program 
variables  to  objects;  a  store  maps  objects  to  values. 

State  =  Env  x  Store 
Env  =  Var  Obj 
Store  =  Obj  Val 

Given  a  variable,  x,  and  a  state,  p,  with  an  environment,  p.e,  and  store,  p,s,  we  use  the  notation  Xp  to  denote 
the  value  of  x  in  state  p\  i.e.,  Xp  =  p,s{p.e{x)).  When  we  refer  to  the  domain  of  a  state,  dom{p),  we  mean 
more  precisely  the  domain  of  the  store  in  that  state. 

We  model  a  type  as  a  triple,  (O,  V",  M),  where  0  C  Obj  is  a  set  of  objects,  V  C  Val  is  a  set  of  values, 
and  M  is  a  set  of  methods.  Each  method  for  an  object  is  a  producer^  an  observer ^  or  a  mutator.  Producers 
of  an  object  of  type  r  return  new  objects  of  type  r;  observers  return  results  of  other  types;  mutators  modify 
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objects  of  type  r.  An  object  is  immutable  if  its  value  cannot  change  and  otherwise  it  is  mutable]  a  type  is 
immutable  if  its  objects  are  and  otherwise  it  is  mutable.  Clearly  a  type  can  be  mutable  only  if  some  of  its 
methods  are  mutators.  We  allow  mixed  methods  where  a  producer  or  an  observer  can  also  be  a  mutator. 
We  also  allow  methods  to  signal  exceptions;  we  assume  termination  exceptions,  i.e.,  each  method  call  either 
terminates  normally  or  in  one  of  a  number  of  named  exception  conditions.  To  be  consistent  with  object- 
oriented  language  notation,  we  write  x.m(a)  to  denote  the  call  of  method  m  on  object  a?  with  the  sequence 
of  arguments  a. 

Objects  come  into  existence  and  get  their  initial  values  through  creators.  (These  are  often  called  con¬ 
structors  in  the  literature.)  Unlike  other  kinds  of  methods,  creators  do  not  belong  to  particular  objects,  but 
rather  are  independent  operations. 

A  computation,  i.e.,  program  execution,  is  a  sequence  of  alternating  states  and  transitions  starting  in 
some  initial  state,  /?o‘ 

Po  Tri  Pi  ...  Trn  Pn 

Each  transition,  Tr*,  of  a  computation  sequence  is  a  partial  function  on  states;  we  assume  the  execution  of 
each  transition  is  atomic,  A  history  is  the  subsequence  of  states  of  a  computation;  we  use  p  and  ^  to  range 
over  states  in  any  computation,  c,  where  p  precedes  ^  in  c.  The  value  of  an  object  can  change  only  through 
the  invocation  of  a  mutator;  in  addition  the  environment  can  change  through  assignment  and  the  domain  of 
the  store  can  change  through  the  invocation  of  a  creator  or  producer. 

Objects  are  never  destroyed: 

V  1  <  i  <  n  .  dom{pi^i)  C  dom[pi). 

4  Specifications 

4.1  Type  Specifications 

A  type  specification  includes  the  following  information: 

•  The  type’s  name; 

•  A  description  of  the  type’s  value  space; 

•  A  definition  of  the  type’s  invariant  and  history  properties; 

•  For  each  of  the  type’s  methods: 

-  Its  name; 

-  Its  signature  (including  signaled  exceptions); 

“  Its  behavior  in  terms  of  pre-conditions  and  post-conditions. 

Note  that  the  creators  are  missing.  Omitting  creators  allows  subtypes  to  provide  different  creators  than 
their  supertypes.  In  addition,  omitting  creators  makes  it  easy  for  a  type  to  have  multiple  implementations, 
allows  new  creators  to  be  added  later,  and  reflects  common  usage;  for  example,  Java  interfaces  and  virtual 
types  provide  no  way  for  users  to  create  objects  of  the  type,  We  show  how  to  specify  creators  in  Section  4.2. 

In  our  work  we  use  formal  specifications  in  the  two-tiered  style  of  Larch  [GHW85].  The  first  tier  defines 
sorts,  which  are  used  to  define  the  value  spaces  of  objects.  In  the  second  tier,  Larch  interfaces  are  used  to 
define  types. 

For  example,  Figure  1  gives  a  specification  for  a  bag  type  whose  objects  have  methods  put,  get,  card, 
and  eguaL  The  uses  clause  defines  the  value  space  for  the  type  by  identifying  a  sort.  The  clause  in  the 
figure  indicates  that  values  of  objects  of  type  bag  are  denotable  by  terms  of  sort  B  introduced  in  the  BBag 
specification,  a  value  of  this  sort  is  a  pair,  {elems,  hound),  where  elems  is  a  mathematical  multiset  of  integers 
and  bound  is  a  natural  number.  The  notation  {  }  stands  for  the  empty  multiset,  U  is  a  commutative 
operation  on  multisets  that  does  not  discard  duplicates,  E  is  the  membership  operation,  and  |  ar  |  is  a 
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cardinality  operation  that  returns  the  total  number  of  elements  in  the  multiset  x.  These  operations  as  well 
as  equality  (=)  and  inequality  (7^)  are  all  defined  in  BBag. 

The  invariant  clause  contains  a  single-state  predicate  that  defines  the  type's  invariant  properties.  The 
constraint  clause  contains  a  two-state  predicate  that  defines  the  type's  history  properties.  We  will  discuss 
these  clauses  in  more  detail  in  subsequent  sections. 


bag  =  type 

uses  BBag  (bag  for  B) 
for  all  b:  bag 

invariant  |  bpxlems  |  <  bp. bound 
constraint  bp. bound  =  b^. bound 

put  =  proc  (r  int) 

requires  |  bpre^dems  |  <  bpre ‘bound 

modifies  b 

ensures  =  6pre*e/ems  U  {i}  A  bpost ‘bound  —  bpre ‘bound 

get  =  proc  (  )  returns  (int) 

requires  bpre‘^lems  7^  {} 
modifies  b 

ensures  bpost-^i^'^^  =  bpre‘^i^'^s  ^  {result}  A  result  E  bpre-^lems  A 
bpost ‘bound  —  bpre ‘bound 

card  =  proc  (  )  returns  (int) 

ensures  result  =  |  bpre-dems  \ 

equal  =  proc  (a:  bag)  returns  (bool) 
ensures  result  =  (a  ~  6) 

end  bag 

Figure  1:  A  Type  Specification  for  Bags 


The  body  of  a  type  specification  provides  a  specification  for  each  method,  Since  a  method's  specification 
needs  to  refer  to  the  method's  object,  we  introduce  a  name  for  that  object  in  the  for  all  line.  We  use  result 
to  name  a  method's  result  parameter.  In  the  requires  and  ensures  clauses  x  stands  for  an  object,  Xpre  for 
its  value  in  the  initial  state,  and  Xpost  for  its  value  in  the  final  state. ^  Distinguishing  between  initial  and 
final  values  is  necessary  only  for  mutable  types,  so  we  suppress  the  subscripts  for  parameters  of  immutable 
types  (like  integers).  We  need  to  distinguish  between  an  object,  x,  and  its  value,  Xpre  or  Xpost,  because 
we  sometimes  need  to  refer  to  the  object  itself,  e.g.,  in  the  equal  method,  which  determines  whether  two 
(mutable)  bags  are  the  same  object. 

A  method  m’s  pre-condition^  denoted  m.pre^  is  the  predicate  that  appears  in  its  requires  clause;  e,g., 
puVs  pre-condition  checks  to  see  that  adding  an  element  will  not  enlarge  the  bag  beyond  its  bound.  If  the 
clause  is  missing,  the  pre-condition  is  trivially  ‘‘true.” 

A  method  m's  post- condition,  denoted  m.post,  is  the  conjunction  of  the  predicates  given  by  its  modifies 
and  ensures  clauses.  A  modifies  xi, . .  .,Xn  clause  is  shorthand  for  the  predicate: 

V  OJ  E  {dom{pre)  -  [xi,.  ,.,Xn})  .  Xpre  =  S^post 

^Note  that  pre  and  post  are  implicitly  universally  quantified  variables  over  states.  Also,  more  formally,  Xpre  stands  for 
pre.s(pre.e(x))]  Xpost  stands  for  post.s(po$t.t(x)). 
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which  says  only  objects  listed  may  change  in  value.  A  modifies  clause  is  a  strong  statement  about  all 
objects  not  explicitly  listed,  i.e.,  their  values  may  not  change;  if  there  is  no  modifies  clause  then  nothing 
inay  change.  For  example,  card’s  post-condition  says  that  it  returns  the  size  of  the  bag  and  no  objects 
(including  the  bag)  change,  and  put’s  post-condition  says  that  the  bag’s  value  changes  by  the  addition  of  its 
integer  argument,  and  no  other  objects  change. 

Methods  may  terminate  normally  or  exceptionally;  the  exceptions  are  listed  in  a  signals  clause  in  the 
method’s  header.  For  example,  instead  of  the  get  method  we  might  have  had 

get'  =  proc  (  )  returns  (int)  signals  (empty) 
modifies  b 

ensures  if  bpre^^lcms  =  {  }  then  signal  empty 
else  bpost-dems  =  bpre-dems  ~  {result}  A 

result  ebpre.elems  A  bpost^bound  =  bpre^bound 

4.2  Specifying  Creators 

Objects  are  created  and  initialized  through  creators.  Figure  2  shows  specifications  for  three  different  creators 
for  bags.  The  first  creator  creates  a  new  empty  bag  whose  bound  is  its  integer  argument.  The  second  and 
third  creators  fix  the  bag’s  bound  to  be  100.  The  third  creator  uses  its  integer  argument  to  create  a  singleton 
bag.  The  assertion  new(a;)  stands  for  the  predicate: 

X  E  dom{post)  ^  dom{pre) 

Recall  that  objects  are  never  destroyed  so  that  dom{pre)  C  dom{post). 


hag_cr€ate=:  proc  (n:  int)  returns  (bag) 
requires  n  >  0 

ensures  new  (result)  A  resultpost  ^ 

hag  ^create  jsmall  —  proc  (  )  returns  (bag) 

ensures  nGw(result)  A  resuHpost  ^  ({},  100) 

bag  ^create  single  =  proc  (i;  int)  returns  (bag) 

ensures  nBw(result)  A  resultp^^t  =  ({«},  100) 

Figure  2:  Creator  Specifications  for  Bags 


4.3  Type  Specifications  Need  Explicit  Invariants 

By  not  including  creators  in  type  specifications  and  by  allowing  subtypes  to  extend  supertypes  with  mutators 
we  lose  a  powerful  reasoning  tool:  data  type  induction.  Data  type  induction  is  used  to  prove  type  invariants. 
The  base  case  of  the  rule  requires  that  each  creator  of  the  type  establish  the  invariant;  the  inductive  case 
requires  that  each  method  (in  particular  each  mutator)  preserve  the  invariant.  Without  the  creators,  we  have 
no  base  case.  Without  knowing  all  mutators  of  type  r  (as  added  by  r’s  subtypes),  we  have  an  incomplete 
inductive  case.  With  no  data  type  induction  rule,  we  cannot  prove  type  invariants! 

To  compensate  for  the  lack  of  a  data  type  induction  rule,  we  state  the  invariant  explicitly  in  the  type 
specification  through  an  invariant  clause;  if  the  invariant  is  trivial  (i.e.,  identical  to  “true”),  the  clause  can 
be  omitted.  The  invariant  defines  the  legal  values  of  its  type  r.  For  example,  we  include 

invariant  |  bp.elems  |  <  bp. hound 

in  the  type  specification  of  Figure  1  to  state  that  the  size  of  a  bounded  bag  never  exceeds  its  bound.  The 
predicate  «i(a:p)  appearing  in  an  invariant  clause  for  type  r  stands  for  the  predicate:  For  all  computations, 
c,  and  all  states  p  in  c,  ’ 
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\lx  :  r  .  X  £  dom{p)  ^  <j>{xp) 

Any  additional  invariant  property  must  follow  from  the  conjunction  of  the  type’s  invariant  and  invariants 
that  hold  for  the  entire  value  space.  For  example,  we  could  show  that  the  size  of  a  bag  is  nonnegative  because 
this  is  true  for  all  mathematical  multiset  values. 

As  part  of  specifying  a  type  and  its  creators  we  must  show  that  the  invariant  holds  for  all  objects  of  the 
type.  All  creators  for  a  type  r  must  establish  r’s  invariant^ 

For  each  creator  for  type  r,  show  for  all  x:t  that  Ir[r€Sultpost/xp]^ 

where  P[a/b]  stands  for  predicate  P  with  every  occurrence  of  b  replaced  by  a.  Similarly,  each  producer  must 
establish  the  invariant  on  its  newly-created  object.  In  addition,  each  mutator  of  the  type  must  preserve  the 
invariant.  To  prove  this,  we  assume  each  mutator  is  called  on  an  object  of  type  r  with  a  legal  value  (one 
that  satisfies  the  invariant),  and  show  that  any  value  of  a  r  object  it  modifies  is  legal: 

For  each  mutator  m  of  r,  for  all  x:r  assume  Iri^pre/^p]  ctnd  show  Irl^post/^p]- 

For  example,  we  would  need  to  show  that  the  three  creators  for  bag  establish  the  invariant,  and  that 
put  and  get  preserve  the  invariant  for  bag.  (We  can  ignore  card  and  equal  because  they  are  observers.) 
Informally  the  invariant  holds  because  each  creator  guarantees  that  the  size  is  no  larger  than  the  bound; 
pufs  pre-condition  checks  that  there  is  enough  room  in  the  bag  for  another  element;  and  get  either  decreases 
the  size  of  the  bag  or  leaves  it  the  same. 

The  loss  of  data  type  induction  means  that  additional  invariants  cannot  be  proved.  Therefore  the  specifier 
must  be  careful  to  define  an  invariant  that  is  strong  enough  that  all  desired  invariants  follow  from  it. 

4.4  Type  Specifications  Need  Explicit  Constraints 

We  are  interested  in  the  history  properties  of  objects  in  addition  to  their  invariant  properties.  We  can 
formulate  history  properties  as  predicates  over  state  pairs,  and  prove  them  using  the  history  rule: 

History  Rule:  For  each  of  the  i  mutators  m  of  r,  for  all  ir  ;  r: 

mj.pre  A  mj.post  =»  <i>[xprel^p,^postl^‘ij>] 

(^{Xp,X^) 

We  cannot  use  this  history  rule  directly,  however.  It  is  incomplete  since  subtypes  may  define  additional 
mutators.  If  we  use  it  without  considering  the  extra  mutators,  it  is  easy  to  prove  properties  that  do  not  hold 
for  subtype  objects! 

To  compensate  for  the  lack  of  the  history  rule,  we  state  history  properties  explicitly  in  the  type  speci¬ 
fication  through  a  constraint  clause^;  if  the  constraint  is  trivial,  the  clause  can  be  omitted.  For  example, 
the  constraint 

constraint  bp, bound  =  b^, bound 

in  the  specification  of  bag  declares  that  a  bag’s  bound  never  changes.  As  another  example,  consider  a  fat^et 
object  that  has  an  insert  but  no  delete  method;  fat^ets  only  grow  in  size,  The  constraint  for  fat^et  would 
be: 

constraint  V  i  :  int  .  i  E  ^  i  E 

The  predicate  <^(xp,x^)  appearing  in  a  constraint  clause  for  type  r  stands  for  the  predicate:  For  all 
computations,  c,  and  all  states  p  and  ^  in  c  such  that  p  precedes 

Mx  :  T  .  X  E  dom{p)  ^  <l>{xp^  x^) 

^The  use  of  the  term  “constraint”  is  borrowed  from  the  Ina  Jo  specification  language  [SH92],  which  also  includes  constraints 
in  specifications. 
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Note  that  we  do  not  require  that  ^  be  the  immediate  successor  of  /»  in  c. 

Just  as  we  had  to  prove  that  methods  preserve  the  invariant,  we  must  show  that  they  satisfy  the  constraint. 
This  is  done  by  using  the  history  rule  for  each  mutator. 

The  loss  of  the  history  rule  is  analogous  to  the  loss  of  a  data  type  induction  rule.  A  practical  consequence 
of  not  having  a  history  rule  is  that  the  specifier  must  make  the  constraint  strong  enough  so  that  all  desired 
history  properties  follow  from  it. 

5  The  Meaning  of  Subtype 

5.1  Specifying  Subtypes 

To  state  that  a  type  is  a  subtype  of  some  other  type,  we  simply  append  a  subtype  clause  to  its  specification. 
We  allow  multiple  supertypes;  there  would  be  a  separate  subtype  clause  for  each.  An  example  is  given  in 
Figure  3. 

A  subtype’s  value  space  may  be  different  from  its  supertype’s.  For  example,  in  the  figure  the  sort,  S, 
for  bounded  stack  values  is  defined  in  BStack  as  a  pair,  {it erns^  limit) ^  where  items  is  a  sequence  of  integers 
and  limit  is  a  natural  number.  The  invariant  indicates  that  the  length  of  the  stack’s  sequence  component  is 
less  than  or  equal  to  its  limit.  The  constraint  indicates  that  the  stack’s  limit  does  not  change.  In  the  pre- 
and  post-conditions,  [  ]  stands  for  the  empty  sequence,  ||  is  concatenation,  last  picks  off  the  last  element  of 
a  sequence,  and  allButLast  returns  a  new  sequence  with  all  but  the  last  element  of  its  argument. 

Under  the  subtype  clause  we  define  an  abstracUon  function,  A,  that  relates  stack  values  to  bag  values 
by  relying  on  the  helping  function,  mk.elems,  that  maps  sequences  to  multisets  in  the  obvious  manner.  (We 
will  revisit  this  abstraction  function  in  Section  5.3.)  The  subtype  clause  also  lets  specifiers  relate  subtype 
methods  to  those  of  the  supertype.  The  subtype  must  provide  all  methods  of  its  supertype;  we  refer  to  these 
as  the  inhenYed  methods.^  Inherited  methods  can  be  renamed,  e.g.,  push  for  put,  all  other  methods  of  the 
supertype  are  inherited  without  renaming,  e,g.,  equal  In  addition  to  the  inherited  methods,  the  subtype 
may  also  have  some  extra  methods,  e.g.,  swapJop.  (Stack’s  equal  method  must  take  a  bag  as  an  argument 
to  satisfy  the  contravariance  requirement.  We  discuss  this  issue  further  in  Section  6.1.) 


5.2  Definition  of  Subtype 

The  formal  definition  of  the  subtype  relation,  ■<,  is  given  in  Figure  4.  It  relates  two  types,  a  and  r,  each  of 
whose  specifications  respectively  preserves  its  invariant,  and  7^,  and  satisfies  its  constraint,  Ca  and  Cr. 
In  the  rules,  since  x  is  an  object  of  type  a,  its  value  [xp^e  or  Xpo^t)  is  a  member  of  S  and  therefore  cannot  be 
used  directly  in  the  predicates  about  r  objects  (which  are  in  terms  of  values  in  T).  The  abstraction  function 
A  is  used  to  translate  these  values  so  that  the  predicates  about  r  objects  make  sense.  A  may  be  partial, 
need  not  be  onto,  but  can  be  many-to-one.  We  require  that  an  abstraction  function  be  defined  for  all  legal 
values  of  the  subtype  (although  it  need  not  be  defined  for  values  that  do  not  satisfy  the  subtype  invariant). 
Moreover,  it  rnust  map  legal  V9.1ues  of  the  subtype  to  legal  values  of  the  supertype, 

The  first  clause  addresses  the  need  to  relate  inherited  methods  of  the  subtype.  Our  formulation  is  similar 
to  America  s  [AmeQO].  The  first  two  signature  rules  are  the  standard  contra/covariance  rules.  The  exception 
rule  says  that  may  not  signal  more  than  ,  since  a  caller  of  a  method  on  a  supertype  object  should  not 
expect  to  handle  an  unknown  exception.  The  pre-  and  post-condition  rules  are  the  intuitive  counterparts  to 
the  contravariant  and  covariant  rules  for  signatures.  The  pre-condition  rule  ensures  the  subtype’s  method 
can  be  called  at  least  in  any  state  required  by  the  supertype.  The  post-condition  rule  says  that  the  subtype 
method  s  post-condition  can  be  stronger  than  the  supertype  method’s  post-condition;  hence,  any  property 
that  can  be  proved  based  on  the  supertype  method’s  post-condition  also  follows  from  the  subtype’s  method’s 
post-condition. 

The  second  clause  addresses  preserving  program-independent  properties.  The  invariant  rule  and  the 
assumption  that  the  type  specification  preserves  the  invariant  suffices  to  argue  that  invariant  properties  of  a 
supertype  are  preserved  by  the  subtype.  The  argument  for  the  preservation  of  subtype’s  history  properties 

^We  do  not  mean  that  the  subtype  inherits  the  code  of  these  methods  but  simply  that  it  provides  methods  with  the  same 
behavior  (as  defined  below)  as  the  corresponding  supertype  methods. 
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stack  —  type 

uses  BStack  (stack  for  S) 
for  all  s:  stack 

invariant  length{sp. items)  <  Sp, limit 
constraint  Sp.limit  =  s^, limit 

push  —  proc  {i:  int) 

requires  lerigth[spre -items)  <  SpreMmit 

modifies  s 

ensures  Spost -items  =  Spre -items  ||  [  i  ]  A  Spost-H'mit  =  Spre -limit 

pop  =  proc  0  returns  (int) 

requires  Spre -items  ^  [  ] 
modifies  s 

ensures  result  =  last{spre  -items)  A  Spost-H^ms  =  all  But  Last[spre  -items)  A 

Spost  —  Spj- Q  •I'tTTt'lt 

swapJop  —  proc  [i:  int) 

requires  :^  [  ] 

modifies  s 

ensures  Spost-H^ms  =  all  But  Last  {spre  -items)  ||  [ « ]  A  Spost-Hmit  =  Spre-Hmit 

height  =  proc  (  )  returns  (int) 

ensures  result  =  length[spre -items) 

equal  =  proc  {t:  bag)  returns  (bool) 
ensures  result  =  (s  =  t) 

subtype  of  bag  {push  for  put,  pop  for  get,  height  for  card) 

\fst  :  S  .  A{st)  =  {mk. el ems{st .items),  stMmit) 
where  mk.elems  :  Seq  — >  M 
Vi  :  Int,  sq  :  Seq 

mk.elems{[  ])  =  {  } 

mkjelems{sq  ||  [  i  ])  =  mk^elems{sq)  U  {i} 

end  stack 

Figure  3:  Stack  Type 
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Definition  of  the  subtype  relation,  cr  =  {0„,S,M)  is  a  subtype  of  r  =  {Or,T,N)  if 
there  exists  an  abstraction  function,  ^  ;  5  ->  T,  and  a  renaming  map,  R:  M  N,  such  that: 

1.  Subtype  methods  preserve  the  supertype  methods’  behavior.  If  ttit  of  t  is  the  corresponding 
renamed  method  of  a,  the  following  rules  must  hold: 

•  Signature  rule, 

-  Contravariance  of  arguments,  rur  and  have  the  same  number  of  arguments.  If 
the  list  of  argument  types  of  rUr  is  and  that  of  rua  is  /?i,  then  Vi  .  a,-  fii. 

-  Covariance  of  result.  Either  both  mr  and  m„  have  a  result  or  neither  has.  If  there 
is  a  result,  let  m^’s  result  type  be  a  and  mo's  be  /?.  Then  R  :<  a. 

-  Exception  rule.  The  exceptions  signaled  by  mg  are  contained  in  the  set  of  exceptions 
signaled  by  ruj , 

•  Methods  rule.  For  all  a?  :  cr: 

-  Pre-condition  rule.  mr.pre[A{xpre)/xpre]  =>  m^.pre. 

-  Post-condition  rule,  ma-post  mr.pOSt[A{Xpre)/Xpre,A{Xpo,t)/Xpost] 

2.  Subtypes  preserve  supertype  properties.  For  all  computations,  c,  and  all  states  p  and  in  c 
such  that  p  precedes  for  all  a?  :  tr: 

•  Invariant  Rule,  Subtype  invariants  ensure  supertype  invariants. 

la  ^  ^r\A{^ p) / ^ 

•  Constraint  Rule,  Subtype  constraints  ensure  supertype  constraints. 

Figure  4:  Definition  of  the  Subtype  Relation 


IS  completely  analogous,  using  the  constraint  rule  and  the  assumption  that  the  type  specification  satisfies  its 
constraint. 

We  do  not  include  the  invariant  in  the  methods  (or  constraint)  rule  directly.  For  example,  the  pre¬ 
condition  rule  could  have  been 

{mr,pre[A{Xpre)/x  pre]  [-^(^pre)/^pre])  mfj.pve 

We  omit  adding  the  invariant  because  if  it  is  needed  in  doing  a  proof  it  can  always  be  assumed,  since  it  is 
known  to  be  true  for  all  objects  of  its  type. 

Note  that  in  the  various  rules  we  require  x  :  a,  yet  x  appears  in  predicates  concerning  r  objects  as  well. 
This  makes  sense  because  (t  ^  r, 

5*3  Applying  the  Definition  of  Subtyping  as  a  Checklist 

Proofs  of  the  subtype  relation  are  usually  obvious  and  can  be  done  by  inspection.  Typically,  the  only  interest- 
ing  part  is  the  definition  of  the  abstraction  function;  the  other  parts  of  the  proof  are  usually  straightforward. 
However,  this  section  goes  through  the  steps  of  an  informal  proof  just  to  show  what  kind  of  reasoning  is 
involved.  Formal  versions  of  these  informal  proofs  are  given  in  [LW92]. 

Let's  revisit  the  stack  and  bag  example  using  our  definition  as  a  checklist.  Here 

^  =  {Ostackj  Sy  {pushy  popy  swapJopy  height y  equal}) 

T  =  {Obagy  By  {puty  gety  cardy  equal]) 

Recall  that  we  represent  a  bounded  bag's  value  as  a  pair,  {elemSy  hound) y  of  a  multiset  of  integers  and  a  fixed 
bound,  and  a  bounded  stack's  value  as  a  pair,  {items y  limit) y  of  a  sequence  of  integers  and  a  fixed  bound.  It 
can  easily  be  shown  that  each  specification  preserves  its  invariant  and  satisfies  its  constraint. 
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We  use  the  abstraction  function  and  the  renaming  map  given  in  the  specification  for  stack  in  Figure  3. 
The  abstraction  function  states  that  for  all  si  :  S 

=  {mk jelems{st. items) ^  si Mmii) 

where  the  helping  function,  mkMems  :  Seq  M,  maps  sequences  to  multisets  such  that  for  all  sq  :  Seq,  i  : 
Ini'. 

mk-elems[[  ])  =  {  } 

mkjelems{sq  ||  [  *  ])  =  mkjelems{sq)  U  {i} 

A  is  partial;  it  is  defined  only  for  sequence-natural  numbers  pairs,  {iiemsjimii)^  where  limitis  greater  than 
or  equal  to  the  size  of  Hems. 

The  renaming  map  R  is 

R(push)  =  put 
R(pop)  =  get 
R  (height)  =  card 
R(equal)  =  equal 

Checking  the  signature  and  exception  rules  is  easy  and  could  be  done  by  the  compiler. 

Next,  we  show  the  correspondences  between  push  and  put^  between  pop  and  gety  etc.  Let’s  look  at  the  pre- 
and  post-condition  rules  for  just  one  method,  push.  Informally,  the  pre-condition  rule  for  put/ push  requires 
that  we  show'll 

I  A{Spre)  ^d€,ms  I  <  A[Spre)  •hound 
length[spreHtems)  <  Spredi>'mit 

Intuitively,  the  pre-condition  rule  holds  because  the  length  of  stack  is  the  same  as  the  size  of  the  corresponding 
bag  and  the  limit  of  the  stack  is  the  same  as  the  bound  for  the  bag.  Here  is  an  informal  proof  with  slightly 
more  detail: 

1.  A  maps  the  stack’s  sequence  component  to  the  bag’s  multiset  by  putting  all  elements  of  the  sequence 
into  the  multiset.  Therefore  the  length  of  the  sequence  Spre-it€,ms  is  equal  to  the  size  of  the  multiset 

4(5pre)*€/ems. 

2.  Also,  A  maps  the  limit  of  the  stack  to  the  bound  of  the  bag  so  that  Spre-H'i^it  =  A{spre) -hound. 

3.  From  pufs  pre-condition  we  know  |  A{spre)-^l^'nfis  \  <  A{spre) -hound. 

4.  push's  pre-condition  holds  by  substituting  equals  for  equals. 

Note  the  role  of  the  abstraction  function  in  this  proof.  It  allows  us  to  relate  stack  and  bag  values,  and 
therefore  we  can  relate  predicates  about  bag  values  to  those  about  stack  values  and  vice  versa.  Also,  note 
how  we  depend  on  A  being  a  function  (in  step  (4)  where  we  use  the  substitutivity  property  of  equality). 

The  post-condition  rule  requires  that  we  show  push's  post-condition  implies  put's.  We  can  deal  with  the 
modifies  and  ensures  parts  separately.  The  modifies  part  holds  because  the  same  object  is  mentioned  in 
both  specifications.  The  ensures  part  follows  from  the  definition  of  the  abstraction  function. 

The  invariant  rule  requires  that  we  show  that  the  invariant  on  stacks: 

length{sp.items)  <  Sp. limit 

implies  that  on  bags: 

I  A{sp).elems  \  <  A{sp).hound 

'^Note  that  we  are  reasoning  in  terms  of  the  values  of  the  object,  s,  and  that  h  and  s  refer  to  the  same  object  (t  appears  in 
the  bag  specification) . 
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We  can  show  this  by  a  simple  proof  of  induction  on  the  length  of  the  sequence  of  a  bounded  stack. 

The  constraint  rule  requires  that  we  show  that  the  constraint  on  stacks: 

Sp. limit  =  s^,. limit 

implies  that  on  bags: 

A{sp).bound  =  A{s^).bound 

This  is  true  because  the  length  of  the  sequence  component  of  a  stack  is  the  same  as  the  size  of  the  multiset 
component  of  its  bag  counterpart. 

Note  that  we  do  not  have  to  say  anything  specific  for  swapJop;  it  is  taken  care  of  just  like  all  the  other 
methods  when  we  show  that  the  specification  of  stack  satisfies  its  invariant  and  constraint. 


6  Type  Hierarchies 

The  requirement  we  impose  on  subtypes  is  very  strong  and  raises  a  concern  that  it  might  rule  out  many 
useful  subtype  relations.  To  address  this  concern  we  looked  at  a  number  of  examples.  We  found  that  our 
technique  captures  what  people  want  from  a  hierarchy  mechanism,  but  we  also  discovered  some  surprises. 

The  examples  led  us  to  classify  subtype  relationships  into  two  broad  categories.  In  the  first  category, 
the  subtype  extends  the  supertype  by  providing  additional  methods  and  possibly  additional  “state.”  In 
the  second,  the  subtype  is  more  constrained  than  the  supertype.  We  discuss  these  relationships  below.  In 
practice,  many  type  families  will  exhibit  both  kinds  of  relationships. 

6.1  Extension  Subtypes 

A  subtype  extends  its  supertype  if  its  objects  have  extra  methods  in  addition  to  those  of  the  supertype. 
Abstraction  functions  for  extension  subtypes  are  onto,  i.e.,  the  range  of  the  abstraction  function  is  the  set  of 
all  legal  values  of  the  supertype.  The  subtype  might  simply  have  more  methods;  in  this  case  the  abstraction 
function  is  one-to-one.  Or  its  objects  might  also  have  more  “state,”  i.e.,  they  might  record  information  that 
is  not  present  in  objects  of  the  supertype;  in  this  case  the  abstraction  function  is  many-to-one. 

As  an  example  of  the  one-to-one  case,  consider  a  type  intset  (for  set  of  integers)  with  methods  to  insert 
and  delete  elements,  to  select  elements,  and  to  provide  the  size  of  the  set.  A  subtype,  intset2,  might  have 
more  methods,  e.g.,  union,  is.empty.  Here  there  is  no  extra  state,  just  extra  methods.  Suppose  intset ’s 
invariant  and  constraints  are  both  trivial;  intset2’s  would  be  as  well.  Thus,  proving  that  intset2  preserves 
intset’s  invariant  and  constraint  is  trivial. 

H  IS  easy  to  discover  when  a  proposed  subtype  really  is  not  one.  For  example,  the  fat..set  type  discussed 
earlier  has  an  insert  method  but  no  delete  method.  Intset  is  not  a  subtype  of  fat-set  because  fatjsets  only 
grow  while  intsets  grow  and  shrink;  intset  does  not  preserve  various  history  properties  of  fat ^et,  in  particular, 
the  constraint  that  once  some  integer  is  in  the  fat-set,  it  remains  in  the  fat.set.  The  attempt  to  show  that 
the  intset  constraint  (which  is  trivial)  implies  that  of  fat^et  would  fail. 

As  a  simple  example  of  a  many-to-one  case,  consider  immutable  pairs  and  triples  (Figure  5).  Pairs  have 
methods  that  fetch  the  first  and  second  elements;  triples  have  these  methods  plus  an  additional  one  to  fetch 
the  third  element.  Triple  is  a  subtype  of  pair  and  so  is  semi- mutable  triple  with  methods  to  fetch  the  first, 
second,  and  third  elements  and  to  replace  the  third  element  because  replacing  the  third  element  does  not 
affect  the  first  or  second  element.  This  example  shows  that  it  is  possible  to  have  a  mutable  subtype  of  an 
immutable  supertype,  provided  the  mutations  are  invisible  to  users  of  the  supertype. 

Mutations  of  a  subtype  that  would  be  visible  through  the  methods  of  an  immutable  supertype  are  ruled 
out.  For  example,  an  immutable  sequence,  whose  elements  can  be  fetched  but  not  stored,  is  not  a  supertype 
of  mutable  array,  which  provides  a  store  method  in  addition  to  the  sequence  methods.  For  sequences  we  can 
prove  elements  do  not  change;  this  is  not  true  for  arrays.  The  attempt  to  construct  the  subtype  relation  will 
fail  because  the  constraint  for  sequences  does  not  follow  from  that  for  arrays. 

Many  examples  of  extension  subtypes  are  found  in  the  literature.  One  common  example  concerns  persons, 
employees,  and  students  (Figure  6).  A  person  object  has  methods  that  report  its  properties  such  as  its  name, 
age,  and  possibly  its  relationship  to  other  persons  (e.g.,  its  parents  or  children).  Student  and  employee  are 
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immutable  pair 


immutable  triple  semi-mutable  triple 


Figure  5:  Pairs  and  Triples 


subtypes  of  person;  in  each  case  they  have  additional  properties,  e.g.,  a  student  id  number,  an  employee 
employer  and  salary.  In  addition,  type  student  .employee  is  a  subtype  of  both  student  and  employee  (and 
also  person,  since  the  subtype  relation  is  transitive).  In  this  example,  the  subtype  objects  have  more  state 
than  those  of  the  supertype  as  well  as  more  methods. 


student_employee 


Figure  6:  Person,  Student,  and  Employee 


Another  example  from  the  database  literature  concerns  different  kinds  of  ships  [HM81].  The  supertype  is 
generic  ships  with  methods  to  determine  such  things  as  who  is  the  captain  and  where  the  ship  is  registered. 
Subtypes  contain  more  specialized  ships  such  as  tankers  and  freighters.  There  can  be  quite  an  elaborate 
hierarchy  (e.g.,  tankers  are  a  special  kind  of  freighter) .  Windows  are  another  well-known  example  [H087]; 
subtypes  include  bordered  windows,  colored  windows,  and  scrollable  windows. 

Common  examples  of  subtype  relationships  are  allowed  by  our  definition  provided  the  equal  method  (and 
other  similar  methods)  are  defined  properly  in  the  subtype.  Suppose  super  type  r  provides  an  equal  method 
and  consider  a  particular  call  x,equal(y).  The  difficulty  arises  when  x  and  y  actually  belong  to  tr,  a  subtype 
of  r.  If  objects  of  the  subtype  have  additional  state,  x  and  y  may  differ  when  considered  as  subtype  objects 
but  ought  to  be  considered  equal  when  considered  as  supertype  objects. 

For  example,  consider  immutable  triples  x  =  (0,  0, 0)  and  y  —  (0,  0, 1).  Suppose  the  specification  of  the 
equal  method  for  pairs  says; 

equal  =  proc  {q:  pair)  returns  (bool) 

ensures  result  =  {p. first  —  q. first  A  p, second  ~  q. second) 

(We  are  using  p  to  refer  to  the  method ^s  object.)  However,  we  would  expect  two  triples  to  be  equal  only  if 
their  first,  second,  and  third  components  were  equal.  If  a  program  using  triples  had  just  observed  that  x  and 
y  differ  in  their  third  element,  we  would  expect  x.equal(y)  to  return  “false,”  but  if  the  program  were  using 
them  as  pairs,  and  had  just  observed  that  their  first  and  second  elements  were  equal,  it  would  be  wrong  for 
the  equal  method  to  return  false. 

The  way  to  resolve  this  dilemma  is  to  have  two  equal  methods  in  triple: 

pair-equal  =  proc  (p:  pair)  returns  (bool) 

ensures  result  =  {p. first  =  q. first  A  p, second  =  q, second) 
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triple.equal  =  proc  (p;  triple)  returns  (bool) 

ensures  result  =  {p, first  =  q. first  A  p. second  =  q. second 
A  p. third  =  q, third) 

One  of  them  {pair.equat)  simulates  the  equal  method  for  pair;  the  other 
(triple-equat)  is  a  method  just  on  triples.  (In  some  object-oriented  languages,  such  as  Java,  the  additional 
equal  methods  are  obtained  by  overloading.) 

The  problem  is  not  limited  to  equality  methods.  It  also  affects  methods  that  “expose”  the  abstract  state 
of  objects,  e.g.,  an  unparse  method  that  returns  a  string  representation  of  the  abstract  state  of  its  object. 
x.unparseQ  ought  to  return  a  representation  of  a  pair  if  called  in  a  context  in  which  x  is  considered  to  be  a 
pair,  but  it  ought  to  return  a  representation  of  a  triple  in  a  context  in  which  x  is  known  to  be  a  triple  (or 
some  subtype  of  triple). 

The  need  for  several  equality  methods  seems  natural  for  realistic  examples.  For  example,  asking  whether 
el  and  e2  are  the  same  person  is  different  from  asking  if  they  are  the  same  employee.  In  the  case  of  a  person 
holding  two  jobs,  the  answer  might  be  true  for  the  question  about  person  but  false  for  the  question  about 
employee. 

6.2  Constrained  Subtypes 

The  second  kind  of  subtype  relation  occurs  when  the  subtype  is  more  constrained  than  the  supertype.  In  this 
case,  the  supertype  specification  is  written  in  a  way  that  allows  variation  in  behavior  among  its  subtypes. 
Subtypes  constrain  the  supertype  by  reducing  the  variability.  The  abstraction  function  is  usually  into  rather 
than  onto.  The  subtype  may  extend  those  supertype  objects  that  it  simulates  by  providing  additional 
methods  and/or  state. 

Since  constrained  subtypes  reduce  variation,  it  is  crucial  when  defining  this  kind  of  type  hierarchy  to 
think  carefully  about  what  variability  is  permitted  for  the  subtypes.  The  variability  will  show  up  in  the 
supertype  specifications  in  two  ways;  in  the  invariant  and  constraint,  and  also  in  the  specifications  of  the 
iiidividual  methods.  In  both  cases  the  supertype  definitions  will  be  nondeterministic  in  those  places  where 
different  subtypes  are  expected  to  provide  diflferent  behavior. 

A  very  simple  example  concerns  elephants.  Elephants  come  in  many  colors  (realistically  grey  and  white, 
but  we  will  also  allow  blue  ones).  However  all  albino  elephants  are  white  and  all  royal  elephants  are  blue! 
Figure  7  shows  the  elephant  hierarchy.  The  set  of  legal  values  for  regular  elephants  includes  all  elephants 
whose  color  is  grey  or  blue  or  white: 

invariant  e^xolor  =  white  V  ep. color  =  grey  V  Cp, color  =  blue 

The  set  of  legal  values  for  royal  elephants  is  a  subset  of  those  for  regular  elephants: 

invariant  Cp. color  =  blue 

and  hence  the  abstraction  function  is  into.  The  situation  for  albino  elephants  is  similar,  Furthermore,  the 
elephant  method  that  returns  the  color  (if  there  is  such  a  method)  can  return  grey  or  blue  or  white,' i.e., 
it  is  nondeterministic;  the  subtypes  restrict  the  nondeterminism  for  this  method  by  defining  it  to  return  a 
specifc  color. 

This  simple  example  has  led  others  to  define  a  subtyping  relation  that  requires  non-monotonic  reasoning 
[Lip92],  but  we  believe  it  is  better  to  use  variability  in  the  supertype  specification  and  straightforward 
reasoning  methods.  However,  the  example  shows  that  a  specifier  of  a  type  family  has  to  anticipate  subtypes 
and  capture  the  variation  among  them  in  the  specification  of  the  supertype. 

The  bag  type  discussed  in  Section  4.1  has  two  kinds  of  variability.  First,  as  discussed  earlier,  the  speci¬ 
fication  oi  get  is  nondeterministic  because  it  does  not  constrain  which  element  of  the  bag  is  removed.  This 
nondeterminism  allows  stack  to  be  a  subtype  of  bag:  the  specification  of  pop  constrains  the  nondetermin¬ 
ism.  We  could  also  define  a  queue  that  is  a  subtype  of  bag;  its  dequeue  method  would  also  constrain  the 
nondeterminism  of  get  but  in  a  way  different  from  pop. 

In  addition,  the  actual  value  of  the  bound  for  bags  is  not  defined;  it  can  be  any  natural  number,  thus 
allowing  subtypes  to  have  different  bounds.  This  variability  shows  up  in  the  specification  of  put,  where  we 
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elephant 


albino 


Figure  7:  Elephant  Hierarchy 


do  not  say  what  specific  bound  value  causes  the  call  to  fail.  Therefore,  a  user  of  put  must  be  prepared  for  a 
failure.  (Of  course  the  user  could  deduce  that  a  particular  call  will  succeed,  based  on  a  previous  sequence  of 
method  calls  and  the  constraint  that  the  bound  of  a  bag  does  not  change.)  A  subtype  of  bag  might  limit  the 
bound  to  a  fixed  value,  or  to  a  smaller  range.  Several  subtypes  of  bag  are  shown  in  Figure  8;  mediumbags 
have  various  bounds,  so  that  this  type  might  have  its  own  subtypes,  e.g.,  bag_150. 


bag 


largebag 
(bound  =  2  ) 


mediumbag  smallbag 

(100  <=  hound  <-  1000)  (bound  =  20) 


bagJSO 
(bound  =  150) 

Figure  8:  A  Type  Family  for  Bags 


The  bag  hierarchy  may  seem  counterintuitive,  since  we  might  expect  that  bags  with  smaller  bounds 
should  be  subtypes  of  bags  with  larger  bounds.  For  example,  we  might  expect  smallbag  to  be  a  subtype 
of  largebag.  However,  the  specifications  for  the  two  types  are  incompatible:  the  bound  of  every  largebag  is 
2^^,  which  is  clearly  not  true  for  smallbags.  Furthermore,  this  difference  is  observable  via  the  methods:  It  is 
legal  to  call  the  put  method  on  a  largebag  whose  size  is  greater  than  or  equal  to  20,  but  the  call  is  not  legal 
for  a  smallbag.  Therefore  the  pre-condition  rule  is  not  satisfied. 

Although  the  bag  type  can  have  subtypes  with  different  bounds,  it  cannot  have  subtypes  where  the 
bounds  of  the  bags  can  change  dynamically.  If  we  wanted  a  type  family  that  included  both  bag  and  such 
dynamic  bags,  we  would  need  to  define  a  supertype  in  which  the  bound  is  allowed,  but  not  required,  to 
vary.  Figure  9  shows  the  new  type  hierarchy.  Dynamiciags  have  a  bound  that  tracks  the  size:  each  time  an 
element  is  added  or  removed  from  a  dynamic_bag,  the  bound  changes  to  match  the  new  size.  Flexible.bags 
have  an  additional  mutator,  change  abound: 

change-bound  =  proc  {n:  int) 

requires  n  >  \bprefelems\ 

modifies  b 

ensures  bpost-^i^'^s  =  bpre-^l^'^s  A  bpost -bound  =  n 

Notice  that  other  types  in  the  family  need  not  have  a  change-bound  method. 

This  example  illustrates  the  different  ways  that  subtypes  reduce  variability.  All  varying.bag  subtypes 
reduce  variability  in  the  specification  for  the  put  method;  varying_bag’s  put  method  is  non-deterministic, 
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varying_bag 

I:  size  <-  bound 
C:  true 


flexible_bag  dynamic_bag  bag 

/:  size  <=  bound  I:  size  =  bound  I:  size  <=  bound 

C;  true  C:  bound  stays  the  same 


[...as  in  Fig.  8...] 


Figure  9:  Another  Type  Family  for  Bags 


since  it  might  add  the  element  (and  change  the  bound)  if  the  size  is  the  same  as  the  bound,  or  it  might 
not.  Bag  and  flexible.bag  reduce  this  variability  by  not  adding  the  element,  whereas  dynamic-bag  does 
add  the  element.  In  addition,  bag  reduces  variability  by  restricting  the  constraint:  the  trivial  constraint 
for  varying-bag  can  be  thought  of  as  stating  “either  a  bag’s  bound  may  change  or  it  stays  the  same;”  the 
constraint  for  bag  reduces  this  variability  by  making  a  choice  (“the  bag’s  bound  stays  the  same”)  and  users 
can  then  rely  on  this  property  for  bags  and  its  subtypes,  Dynamic-bag  reduces  variability  by  restricting 
varying-bag ’s  invariant  so  that  it  no  longer  allows  the  size  to  be  less  than  the  bound.  Finally,  flexible-bag 
reduces  variability  because  of  the  extra  mutator,  change  J)ound\  all  its  subtypes  must  allow  explicit  re-setting 
of  the  bound. 

Another  example  is  a  family  of  integer  counters  shown  in  Figure  10.  When  a  counter  is  advanced,  we 
only  know  that  its  value  gets  bigger,  so  that  the  constraint  is  simply 

constraint  Cp  <  c,p 

The  doubler  and  multiplier  subtypes  have  stronger  constraints.  For  example,  a  multiplier’s  value  always 
increases  by  a  multiple,  so  that  its  constraint  is: 

constraint  3  n  :  int  .  [  n  >  0  A  Cp  =  n  *  ] 

For  a  family  like  this,  we  might  choose  to  have  an  advance  method  for  counter  (so  that  each  of  its  subtypes  is 
constrained  to  have  this  method)  or  we  might  not.  If  we  do  provide  an  advance  method,  its  specification  will 
have  to  be  nondeterministic  (i.e.,  it  merely  states  the  the  size  of  the  counter  grows)  to  allow  the  subtypes  to 
provide  the  definitions  that  are  appropriate  for  them. 

In  the  case  of  the  bag  family  illustrated  in  Figure  8,  all  types  in  the  hierarchy  might  be  “real”  in  the  sense 
that  they  have  objects.  However,  sometimes  supertypes  are  virtual]  they  define  the  properties  all  subtypes 
have  in  common  but  have  no  objects  of  their  own.  Varying_bag  of  Figure  9  might  be  such  a  type. 

Virtual  types  are  useful  in  many  type  hierarchies.  For  example,  we  would  use  them  to  construct  a 
hierarchy  for  integers.  Smaller  integers  cannot  be  a  subtype  of  larger  integers  because  of  observable  differences 
in  behavior;  for  example,  an  overflow  exception  that  would  occur  when  adding  two  32-bit  integers  would 
not  occur  if  they  were  64-bit  integers.  Also,  larger  integers  cannot  be  a  subtype  of  smaller  ones  because 
exceptions  do  not  occur  when  expected.  However,  we  clearly  would  like  integers  of  different  sizes  to  be 
related.  This  is  accomplished  by  designing  a  virtual  supertype  that  includes  them.  Such  a  hierarchy  is 
shown  in  Figure  11,  where  integer  is  a  virtual  type  whose  invariant  simply  says  that  the  size  of  an  integer  is 
greater  than  zero.  Integer  types  with  different  sizes  are  subtypes  of  integer.  In  addition,  small  integer  types 
are  subtypes  of  regularJnt,  another  virtual  type;  the  invariant  in  the  specification  for  regularJnt  states  that 
the  size  of  an  integer  is  either  16  bits  or  32  bits.  An  integer  family  might  have  a  structure  like  this,  or  it 
might  be  flatter  by  having  all  integer  types  be  direct  subtypes  of  integer. 
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counter 

(value  never  decreases) 


incrementer  doubler  multiplier 
( value  never  decreases )  ( value  doubles )  ( value  multiplies) 

Figure  10:  Type  Family  for  Counters 


64-bit-int 


integer 


Figure  11:  Integer  Family 


7  Related  Work 

Some  research  on  defining  subtype  relations  is  concerned  with  capturing  constraints  on  method  signatures  via 
the  contra/covariance  rules,  such  as  those  used  in  languages  like  Trellis/Owl  [SCB+86],  Emerald[BHJ‘^87], 
Quest  [Car88],  Eiffel  [Mey88],  POOL  [Ame90],  and  to  a  limited  extent  Modula-3  [Nel91].  Our  rules  place 
constraints  not  just  on  the  signatures  of  an  object’s  methods,  but  also  on  their  behavior. 

Our  work  is  most  similar  to  that  of  America  [Ame91],  who  has  proposed  rules  for  determining  based 
on  type  specifications  whether  one  type  is  a  subtype  of  another.  Meyer  [Mey88]  also  uses  pre-  and  post¬ 
condition  rules  similar  to  America’s  and  ours.  Cusack’s  [Cus91]  approach  of  relating  type  specifications 
defines  subtyping  in  terms  of  strengthening  state  invariants.  However,  none  of  these  authors  considers  neither 
the  problems  introduced  by  extra  mutators  nor  the  preservation  of  history  properties.  Therefore,  they  allow 
certain  subtype  relations  that  we  forbid  (e.g.,  intset  could  be  a  subtype  of  fat_set  in  these  approaches). 

Our  use  of  constraints  in  place  of  the  history  rule  is  one  of  two  techniques  discussed  in  [LW94].  That 
paper  proposes  a  second  technique  in  which  there  is  no  constraint;  instead,  extra  methods  are  not  allowed 
to  introduce  new  behavior.  It  requires  that  the  behavior  of  each  extra  mutator  be  “explained”  in  terms  of 
existing  behavior,  through  existing  methods.  We  believe  the  use  of  constraints  is  simpler  and  easier  to  reason 
about  than  this  “explanation”  approach. 

The  emphasis  on  semantics  of  abstract  types  is  a  prominent  feature  of  the  work  by  Leavens.  In  his  Ph.D. 
thesis  Leavens  [Lea89]  defines  types  in  terms  of  algebras  and  subtyping  in  terms  of  a  simulation  relation 
between  them.  His  simulation  relations  are  a  more  general  form  of  our  abstraction  functions.  Leavens 
considered  only  immutable  types.  Dhara  [Dha92,  DL92,  LD92]  extends  Leavens’  thesis  work  to  deal  with 
mutable  types,  but  rules  out  the  cases  where  extra  methods  cause  problems,  e.g.,  aliasing.  Because  of  their 
restrictions  they  allow  some  subtype  relations  to  hold  where  we  do  not.  For  example,  they  allow  mutable 
pairs  to  be  a  subtype  of  immutable  pairs  whereas  we  do  not. 

Others  have  worked  on  the  specification  of  types  and  subtypes.  For  example,  many  have  proposed  Z  as 
the  basis  of  specifications  of  object  types[CL91,  DD90,  CDD+89];  Goguen  and  Meseguer[GM87]  use  FOOPS; 
Leavens  and  his  colleagues  use  Larch[Lea91,  LW90,  DL92].  Though  several  of  these  researchers  separate  the 
specification  of  an  object’s  creators  from  its  other  methods,  none  has  identified  the  problem  posed  by  the 
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missing  creators,  and  thus  none  has  provided  an  explicit  solution  to  this  problem. 

8  Summary 

We  defined  a  new  notion  of  the  subtype  relation  based  on  the  semantic  properties  of  the  subtype  and 
supertype.  An  object  s  type  determines  both  a  set  of  legal  values  and  an  interface  with  its  environment 
(through  calls  on  its  methods).  Thus,  we  are  interested  in  preserving  properties  about  supertype  values 
and  methods  when  designing  a  subtype.  We  require  that  a  subtype  preserve  the  behavior  of  the  supertype 
methods  and  also  all  invariant  and  history  properties  of  its  supertype.  We  are  particularly  interested  in  an 
object’s  observable  behavior  (state  changes),  thus  motivating  our  focus  on  history  properties  and  on  mutable 
types  and  mutators. 

We  also  presented  a  way  to  specify  the  semantic  properties  of  types  formally.  One  reason  we  chose  to 
base  our  approach  on  Larch  is  that  Larch  allows  formal  proofs  to  be  done  entirely  in  terms  of  specifications. 
In  fact,  once  the  theorems  corresponding  to  our  subtyping  rules  are  formally  stated  in  Larch,  their  proofs  are 
almost  completely  mechanical — a  matter  of  symbol  manipulation— and  could  be  done  with  the  assistance  of 
the  Larch  Prover[GG89,  ZW97]. 

In  developing  our  definition,  we  were  motivated  primarily  by  pragmatics.  Our  intention  is  to  capture 
the  intuition  programmers  apply  when  designing  type  hierarchies  in  object-oriented  languages.  However, 
intuition  in  the  absence  of  precision  can  often  go  astray  or  lead  to  confusion.  This  is  why  it  has  been  unclear 
how  to  orpnize  certain  type  hierarchies  such  as  integers.  Our  definition  sheds  light  on  such  hierarchies 
and  helps  in  uncovering  new  designs.  It  also  supports  the  kind  of  reasoning  that  is  needed  to  ensure  that 
programs  that  work  correctly  using  the  supertype  continue  to  work  correctly  with  the  subtype. 

Programmers  have  found  our  approach  relatively  easy  to  apply  and  use  it  primarily  in  an  informal  way. 
The  essence  of  a  subtype  relationship  is  expressed  in  the  mappings.  These  mappings  can  be  defined  informally, 
in  much  the  same  way  that  abstraction  functions  and  representation  invariants  are  given  as  comments  in  a 
program  that  implements  an  abstract  type.  The  proofs  can  also  be  done  informally,  in  the  style  given  in 
Section  5.3;  they  are  usually  straightforward  and  can  be  done  by  inspection. 

We  also  showed  that  our  approach  is  useful  by  looking  at  a  number  of  examples.  This  led  us  to  identify 
two  kinds  of  subtypes:  ones  that  extend  the  supertype,  and  ones  that  constrain  it.  In  the  former  case,  the 
supertype  can  be  defined  without  a  great  deal  of  thought  about  the  subtypes,  but  in  the  latter  case,  this  is 
not  possible;  instead  the  supertype  specification  must  be  done  carefully  so  that  it  allows  all  of  the  intended 
subtypes.  In  particular  the  specification  of  the  supertype  must  contain  sufficient  nondeterminism  in  the 
invariant,  constraint,  and  method  specifications. 

Our  analysis  raises  two  issues  about  type  hierarchy  that  have  been  ignored  previously  by  both  the 
formal  methods  and  object-oriented  communities.  First,  subtypes  can  have  more  methods,  specifically  more 
mutators,  than  their  supertypes.  Second,  subtypes  need  to  have  different  creators  than  supertypes.  These 
issues  forced  us  to  revisit  proof  rules  normally  associated  with  type  specifications:  the  data  type  induction 
rule  and  the  history  rule.  We  decided  to  preclude  the  use  of  these  rules,  and  to  have  explicit  invariants 
and  constraints  to  replace  them.  Although  it  is  possible  to  define  a  subtype  relation  that  avoids  explicit 
invariants  and  constraints,  doing  so  is  awkward  and  often  requires  invention  of  superfluous  supertype  methods 
and  creators.  We  prefer  to  use  explicit  invariants  and  constraints  because  this  allows  a  more  direct  way  of 
capturing  the  designer’s  intent. 
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